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The lambda phage is a paradigm temperate bacteriophage. Its lysogenic and lytic life cycles echo 
competition between the DNA binding CI and CRO proteins. Here we address the Physics of this 
transition in terms of an energy function that portrays the backbone as a multi-soliton configuration. 
The precision of the individual solitons far exceeds the B-factor accuracy of the experimentally 
determined protein conformations giving us confidence to conclude that three of the four loops are 
each composites of two closely located solitons. The only exception is the repressive DNA binding 
turn, it is the sole single soliton configuration of the backbone. When we compare the solitons with 
the Protein Data Bank we find that the one preceding the DNA recognition helix is unique to the 
CI protein, prompting us to conclude that the lysogenic to lytic transition is due to a saddle-node 
bifurcation involving a soliton-antisoliton annihilation that removes the first loop. 

The CI repressor protein and its lytic counterpart, the CRO protein of the Escherichia coli binding A phage, are 
among the most extensively studied proteins in molecular biology U, [1]. They display a highly intriguing biological 
behavior by controlling the transitions between the lysogenic and lytic phases, that has been detailed in numerous 
molecular biology textbooks and review articles. At a qualitative, mechanistic level the transition between the 
lysogenic and lytic phases is quite well understood [I], [2|- But at a quantitative level we do not yet understand the 
physical principle that triggers the transition. Since lysogeny is an important example of gene control by repressors, 
a quantitative Physics based explanation should have wide biophysical interest and applicability. 

In this Letter we search for a Physics based explanation for the transition between lysogenic and lytic phases in the 
A phage. For this wc scrutinize the fine structure of the folded CI repressor protein, a homo-dimer with 92 residues 
in each of the two monomers T . The protein binds to DNA with a helix-turn-helix motif that is located between the 
residue sites 33-51. Full crystallographic information is available in Protein Data Bank (PDB) under code ILMB. 

We construct the ILMB backbone conformation explicitly in terms of solitons that emerge as classical solutions to 
the following energy function 
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The summation extends over all residues with Ki G [— 7r,7r] mod(27r) the bond angle along the lattice that is formed 
by the central Cq carbons, and r.^ € [— tt, tt] mod(27r) the ensuing torsion angle. The parameters (c, m, b, d, e) are 
all global and specific to a given motif. Once these angles are known, we can use the discrete Frenet equation to 
reconstruct the protein backbone as a piecewise linear polygonal chain. 

We emphasize that the energy function ([T]) does not purport to explain the details of the atomary level mechanisms 
that fold the CI protein. Rather, it enables us to examine the properties of the folded CI in terms of universal physical 
arguments. 

Curiously, ([T]) has the functional form of the discretized Landau-Ginzburg free energy, that similarly describes the 
Physics of superconductivity : In a continuum limit the first two terms of ([I]) combine into derivative of curvature 
that plays the role of Cooper pair density in the Landau-Ginzburg theory. The third term is the symmetry breaking 
potential. The fourth term has its origin in spontaneous symmetry breaking, its presence leads to the notorious 
Meissner effect in superconductivity [7]. The fifth term which is absent in the standard Landau-Ginzburg free energy, 
is the Chern-Simons term that gives the protein backbone its chirality. Finally, the last term is like the Proca mass 
of a supercurrent. In fact, in ([T]) we have included exactly all those terms that are consistent with general principles 
of universality and gauge invariance [3] . 

We start by introducing the classical equations of motion for ([T]) . We first eliminate in terms of the bond angles 
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FIG. 1: The number of entries in PDB with temperature below 50K vs. Debye- Waller fluctuation distance. 



Consequently the torsion angles are determined entirely by the bond angles and the two parameter ratios. When we 
substitute ([2| to the equation for k^, we arrive at 



Ki+i - 2kj + = U'[ni\Ki = {i = l,...,N) (3) 



(with kq — kn+1 = 0) where 

U{k] = -'^■T[K]-2cm'^ ■ + c- k'^ (4) 

Since the torsion angle depends only on the parameter ratios | and ^, if we scale d, e and b equally the profile of 
t[k\ remains intact. In the limit where d becomes vanishingly small the first term in Q can then be safely removed, 
and the equation reduces to the ubiquitous spontaneously broken discrete nonlinear Schrodinger equation [5]. Thus, 
in the d — > limit the solution approaches the soliton profile of the ensuing continuum equation [S] , 

Here r labels the different helix-loop-helix motifs of ILMB, and (c^i, 0^2: 'JT-rij Wr2, s^) are specific to the motif; the 
parameter Sr that is absent in ([T]) specifies the location of the rth loop. The parameters (cr.i,Cr-2) characterize the 
length of the loop, and (mri,m,.2) together with the ratios (g, determine the global character of the helices and 
strands that are adjacent to the loops. Remarkably this leaves us with no other loop specific parameters besides the 
Cr that determine the length of the loops, and Sr that determine their positions. 

We propose that as such the classical soliton profiles are duly describing the Cq, lattice only in the limit where thermal 
fluctuations vanish. But even near zero temperature the protein remains subject to residual zero-point fluctuations. 
It is difficult to estimate and even harder to accurately calculate the amplitude of these zero-point fluctuations. As a 
consequence, in order to get a realistic order of magnitude estimate we have inspected the distribution of the B-factors 
that characterize experimental uncertainties, for all PDB structures where the crystallographic measurements have 
been made at temperatures less than 50K. The result is displayed in Figure 1. From it we conclude that for the Cq, 
carbons the zero point fluctuations have an amplitude somewhere in the vicinity of the lower bound which is around 
0.15 A. Consequently we describe the estimated range of zero point fluctuations around our classical soliton profiles 
by dressing them with a tubular dominion that has a radius of 0.15 A. 
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TABLE I: Parameter values for each of the seven sohtons in Figure 2. 
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FIG. 2: The seven solitons for the first monomer of ILMB, with their respective residue numbers. The black line denotes 
the distance between soliton and corresponding PDB configuration. The red line denotes the Debye-Waller distance that is 
computed form the B-factors in PDB. The grey area describes the estimated 0.15 A zero point fluctuation distance of the 
soliton. 



In Table 1 we provide the parameter values for ^ and computed from the PDB data of ILMB using a Monte 
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Carlo fitting algorithm. In Figure 2 we compare the ensuing solitons with the folded ILMB: The solitons describe the 
structural motifs of ILMB with a precision that is substantially better than the experimental accuracy determined by 
B-factors, even when we account for the 0.15 A estimate of the solitons zero point fluctuations. 

With the aid of our high accuracy solitons we conclude that in ILMB there are a total of seven a-helices and one 
/3-strand. But two of the a-helices and the sole /3-strand are so short that until now they have been interpreted as 
parts of loops. They become exposed only by the high accuracy of our construction. This refinement of the consensus 
interpretation has important fully testable repercussions to the CI protein that allow us to address the Physics of the 
lysogenic to lytic transition: 

The only motif where our soliton picture identifies a loop as a single isolated soliton is the DNA binding one. All 
of the remaining three putative loops consist of a soliton-antisoliton pair, with the solitons separated from each other 
either by a very short a-helix in case of the residues (23, 33) and (69, 90), or by a very short /3-strand in case of residues 
(51, 61). This interpretation reveals itself only when we scrutinize the fine details of the (k^, Tj) spectrum in terms of 
our solitons. As an example, in Figure 3 we display the putative first helix-loop-helix motif and for comparison we 




FIG. 3: The resolved (fCi, Ti) spectrum for the putative first helix-loop-helix of ILMB (left) and the corresponding structure of 
in 20VG (right). The bond angle k is black, torsion angle r is red. The bond angle spectra reveal that in ILMB the loop is a 
bound state of two solitons, while in 20VG there is only one soliton. 



display the corresponding structure in the CRO protein with PDB code 20VG. Our refined interpretation is palpable, 
in the case of CI the motif is clearly a bound state of two solitons while in the case of CRO we have a single isolated 
soliton: 

The parameter Sr in ^ determines the center of soliton i.e. the position of the infiection point in the ensuing 
space curve where the interpolated bond angle in Figure 3 vanishes. An isolated inflection point such as the one in 
the right hand side of Figure 3 (20VG) is topologically stable in the sense that it can not be created nor removed 
by any continuous local deformation. For a given finite length curve, an individual inflection point i.e. a soliton can 
be made or deleted only by transporting it through one of the end points of the curve. On the other hand, a pair 
of inflection points i.e. a soliton-antisoliton pair such as the one in the left hand side of Figure 3 (ILMB) is not 
topologically stable but can be created or removed locally by a saddle-node bifurcation that brings the two inflection 
points together. 

A comparison between the CI and CRO soliton profiles in Figure 3 then proposes the following experimentally fully 
testable mechanism for the lysogenic-lytic transition: Under lysogenic conditions where the CI protein prevails, the 
soliton-antisoliton pairs of the CI protein that are located immediately prior and after the DNA binding domain are 
relatively stable. But when there is a change in the environmental conditions that excites phonon fluctuations along 
the protein chain, such as raise in temperature or UV radiation, either of these soliton-antisoliton pairs can discharge 
by a saddle- node bifurcation. This bifurcation disturbs the structure of the immediately adjacent DNA binding motif 
to the extent that the protein looses its capability to maintain the lysogenic phase. Since each of the corresponding 
motifs in the CRO protein are topologically stable single solitons they are insensitive to local phonon excitations, and 
the lytic phase takes over. 

We note that the shoulder of the short a-helix in the last loop is anchored by the presence of a proline at site 78. 
Consequently in the first approximation we can safely exclude a bifurcation instability from occurring in the putative 
helix-loop-helix motif between the residues (69,90). 
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We still need to conclude which of the two motifs of CI that are adjacent to the DNA binding domain is the one 
that looses its stability in the proposed bifurcation transition. Unfortunately, it appears that a full answer must 
wait until computational methods have reached sufficient maturity [9]. However, to provide the probable answer we 
have performed a statistical analysis on the occurrence of our seven solitons in all PDB proteins. In Table 2 we list 



TABLE II: The soliton sites used in searching for matching structures in PDB, together with the number of matches. The 
search is limited to those x-ray structures that have a resolution better than 2.0 A and a match is a configuration that deviate 
less than 0.5 A in total RMSD distance from the soliton. 
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6 
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Sites 


(20,28) 


(27,36) 


(36,46) 


(50,58) 


(55,63) 


(66,75) 


(74,82) 


Matches 


9601 


4 


810 


159 


1552 


1342 


406 



the number of matches that each of these soliton has when we search PDB for configurations that deviate from the 
given soliton by an overall RMSD distance less than 0.5 A. We have chosen this cut-off value since it is representative 
of the Debye- Waller fluctuation distance in the experimental ILMB data. The remarkable observation is that for 
the second soliton in the loop preceding the DNA recognition helix, the only matching structures are located in the 
different PDB entries of the A phage CI protein itself. This absence of the second soliton in PDB strongly proposes 
that the ensuing loop must be unstable when the protein is in any other in vivo environment. Thus the most probable 
source of the lysogenic to lytic transition is the saddle-node bifurcation that takes place in the first loop and makes 
its soliton-antisoliton pair to annihilate each other. The bifurcation causes the ensuing structure to act like a crowbar 
that lifts the recognition helix from its place in the DNA groove. We note that as such, it is obvious from ([T]) that in 
isolation the first soliton-antisoliton pair has an energy which is higher than that of the ground state i.e. an a-hclix. 
But a more detailed molecular dynamics simulation needs to be performed to confirm our proposal. 

All of the other solitons are ubiquitous in PDB and except for the turn that participates directly to the regulatory 
process their biophysical role remains to be clarified. 

Finally, our soliton interpretation reveals the following fully testable pattern for the folding pathways of ILMB: The 
functionally pertinent DNA binding loop is an isolated soliton, while all the remaining three loop structures consist 
of a soliton-antisoliton pair. Since an isolated soliton is topologically stable while soliton-antisoliton pairs are not, 
the DNA binding loop must be created very early, presumably during translation. The initial configuration for the 
folding process is then a single soliton state. During the folding process the remaining three motifs are created by 
local phononic fluctuations, as soliton-antisoliton pairs. Due to the presence of the proline at site 78, the ensuing 
motif probably emerges very early. 

A.N. thanks H. Frauenfelder and G. Petsko for communications and J. Aqvist for discussions. 
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